Circular clustering of protein dihedral angles by Minimum Message Length.

نویسندگان

  • D L Dowe
  • L Allison
  • T I Dix
  • L Hunter
  • C S Wallace
  • T Edgoose
چکیده

Early work on proteins identified the existence of helices and extended sheets in protein secondary structures, a high-level classification which remains popular today. Using the Snob program for information-theoretic Minimum Message Length (MML) classification, we are able to take the protein dihedral angles as determined by X-ray crystallography, and cluster sets of dihedral angles into groups. Previous work by Hunter and States has applied a similar Bayesian classification method, AutoClass, to protein data with site position represented by 3 Cartesian co-ordinates for each of the alpha-Carbon, beta-Carbon and Nitrogen, totalling 9 co-ordinates. By using the von Mises circular distribution in the Snob program, we are instead able to represent local site properties by the two dihedral angles, phi and psi. Since each site can be modelled as having 2 degrees of freedom, this orientation-invariant dihedral angle representation of the data is more compact than that of nine highly-correlated Cartesian co-ordinates. Using the information-theoretic message length concepts discussed in the paper, such a more concise model is more likely to represent the underlying generating process from which the data came. We report on the results of our classification, plotting the classes in (phi, psi) space; and introducing a symmetric information-theoretic distance measure to build a minimum spanning tree between the classes. We also give a transition matrix between the classes and note the existence of three classes in the region phi approximately -1.09 rad and psi approximately -0.75 rad which are close on the spanning tree and have high inter-transition probabilities. This gives rise to a tight, abundant and self-perpetuating structure.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Information Theory to Discover Side Chain Rotamer Classes: Analysis of the Effects of Local Backbone Structure

An understanding of the regularities in the side chain conformations of proteins and how these are related to local backbone structures is important for protein modeling and design. Previous work using regular secondary structures and regular divisions of the backbone dihedral angle data has shown that these rotamers are sensitive to the protein's local backbone conformation. In this preliminar...

متن کامل

Calculations of Dihedral Groups Using Circular Indexation

‎In this work‎, ‎a regular polygon with $n$ sides is described by a periodic (circular) sequence with period $n$‎. ‎Each element of the sequence represents a vertex of the polygon‎. ‎Each symmetry of the polygon is the rotation of the polygon around the center-point and/or flipping around a symmetry axis‎. ‎Here each symmetry is considered as a system that takes an input circular sequence and g...

متن کامل

Dihedral angle entropy measures for intrinsically disordered proteins.

Protein stability is based on a delicate balance between energetic and entropic factors. Intrinsically disordered proteins (IDPs) interacting with a folded partner protein in the act of binding can order the IDP to form the correct functional interface by decrease in the overall free energy. In this work, we evaluate the part of the entropic cost of ordering an IDP arising from their dihedral s...

متن کامل

Real-value and confidence prediction of protein backbone dihedral angles through a hybrid method of clustering and deep learning

Background. Protein dihedral angles provide a detailed description of protein local conformation. Predicted dihedral angles can be used to narrow down the conformational space of the whole polypeptide chain significantly, thus aiding protein tertiary structure prediction. However, direct angle prediction from sequence alone is challenging. Method. In this study, we present a novel method to pre...

متن کامل

Online clustering via finite mixtures of Dirichlet and minimum message length

This paper presents an online algorithm for mixture model-based clustering. Mixture modeling is the problem of identifying and modeling components in a given set of data. The online algorithm is based on unsupervised learning of finite Dirichlet mixtures and a stochastic approach for estimates updating. For the selection of the number of clusters, we use the minimum message length (MML) approac...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

دوره   شماره 

صفحات  -

تاریخ انتشار 1996